199 research outputs found
Inkrementelle Koreferenzanalyse für das Deutsche
Es wird ein inkrementeller Ansatz zur Koreferenzanalyse
deutscher Texte vorgestellt. Wir zeigen
anhand einer breiten empirischen Untersuchung,
dass ein inkrementelles Verfahren einem nichtinkrementellen
überlegen ist und dass jeweils die
Verwendung von mehreren Klassifizierern bessere
Resultate ergibt als die Verwendung von nur einem.
Zudem definieren wir ein einfaches Salienzmass,
dass annähernd so gute Ergebnisse ergibt wie ein
ausgefeiltes, auf maschinellem Lernen basiertes
Verfahren. Die Vorverarbeitung erfolgt ausschliesslich
durch reale Komponenten, es wird nicht - wie
so oft - auf perfekte Daten (z.B. Baumbank statt
Parser) zurückgegriffen. Entsprechend tief sind die
empirischen Ergebnisse. Der Ansatz operiert mit
harten linguistischen Filtern, wodurch die Menge
der Antezedenskandidaten klein gehalten wird. Die
Evaluierung erfolgt anhand der Koreferenzannotationen
der TüBa-D/Z
Real Anaphora Resolution is Hard
We introduce a system for anaphora resolution for German that uses various resources in order to develop a real system as opposed to systems based on idealized assumptions, e.g. the use of true mentions only or perfect parse trees and perfect morphology. The components that we use to replace such idealizations comprise a full-fledged morphology, a Wikipedia-based named entity recognition, a rule-based dependency parser and a German wordnet. We show that under these conditions coreference resolution is (at least for German) still far from being perfect
Information Extraction From Chemical Patents
The development of new chemicals or pharmaceuticals is preceded by an indepth analysis of published patents in this field. This information retrieval is a costly and time inefficient step when done by a human reader, yet it is mandatory for potential success of an investment. The goal of the research project UIMA-HPC is to automate and hence speed-up the process of knowledge mining about patents. Multi-threaded analysis engines, developed according to UIMA (Unstructured Information Management Architecture) standards, process texts and images in thousands of documents in parallel. UNICORE (UNiform Interface to COmputing Resources) workflow control structures make it possible to dynamically allocate resources for every given task to gain best cpu-time/realtime ratios in an HPC environment
Anaphora Resolution with Real Preprocessing
In this paper we focus on anaphora resolution for German, a highly inflected language which also allows for closed form compounds (i.e. compounds without spaces). Especially, we describe a system that only uses real preprocessing components, e.g. a dependency parser, a two-level morphological analyser etc. We trace the performance drop occurring under these conditions back to underspecification and ambiguity at the morphological level. A demanding subtask of anaphora resolution are the so-called bridging anaphora, a special variant of nominal anaphora where the heads of the coreferent noun phrases do not match. We experiment with two different resources in order to find out how to cope best with this problem
Spin Asymmetries In Diffractive Leptoproduction
In this report we calculate the cross section and asymmetry for the
diffractive leptoproduction. We study dependences of the asymmetry on
the structure of the Pomeron-proton couplingComment: 4 pages, latex, two PS figures, presented at the International
Workshop "Symmetry and Spin" PRAHA'9
Just Friends / music by John Klenner; words by Sam M. Lewis
Cover: photo of Red McKenzie; Publisher: Robbins Music Corporation (New York)https://egrove.olemiss.edu/sharris_e/1016/thumbnail.jp
United we stand: improving sentiment analysis by joining machine learning and rule based methods
In the past, we have succesfully used machine learning approaches for sentiment analysis. In the course of those experiments, we observed that our machine learning method, although able to cope well with figurative language could not always reach a certain decision about the polarity orientation of sentences, yielding erroneous evaluations. We support the conjecture that these cases bearing mild figurativeness could be better handled by a rule-based system. These two systems, acting complementarily, could bridge the gap between machine learning and rule-based approaches. Experimental results using the corpus of the Affective Text Task of SemEval ’07, provide evidence in favor of this direction. 1
- …